Memory-Augmented Temporal Dynamic Learning for Action Recognition
نویسندگان
چکیده
منابع مشابه
Learning Representative Temporal Features for Action Recognition
in this paper we present a novel video classification methodology that aims to recognize different categories of third-person videos efficiently. The idea is to tracking motion in videos and extracting both short-term and long-term features from motion time series by training a multichannel one dimensional Convolutional Neural Network (1DCNN). The positive point about our method is that we only...
متن کاملAppendix: Asynchronous Temporal Fields for Action Recognition
1.1. Description of the CRF We create a CRF which predicts activity, object, etc., for every frame in the video. For reasoning about time, we create a fully-connected temporal CRF, referred to as Asynchronous Temporal Field in the text. That is, unlike a linear-chain CRF for temporal modelling (the discriminative counterpart to Hidden Markov Models), each node depends on the state of every othe...
متن کاملSpatio-temporal SURF for Human Action Recognition
In this paper, we propose a new spatio-temporal descriptor called ST-SURF. The latter is based on a novel combination between the speed up robust feature and the optical flow. The Hessian detector is employed to find all interest points. To reduce the computation time, we propose a new methodology for video segmentation, in Frames Packets FPs, based on the interest points trajectory tracking. W...
متن کاملSecond-order Temporal Pooling for Action Recognition
Most successful deep learning models for action recognition generate predictions for short video clips, which are later aggregated into a longer time-frame action descriptor by computing a statistic over these predictions. Zeroth (max) or first order (average) statistic are commonly used. In this paper, we explore the benefits of using second-order statistics. Specifically, we propose a novel e...
متن کاملLong-term Temporal Convolutions for Action Recognition
Typical human actions last several seconds and exhibit characteristic spatio-temporal structure. Recent methods attempt to capture this structure and learn action representations with convolutional neural networks. Such representations, however, are typically learned at the level of a few video frames failing to model actions at their full temporal extent. In this work we learn video representa...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Proceedings of the AAAI Conference on Artificial Intelligence
سال: 2019
ISSN: 2374-3468,2159-5399
DOI: 10.1609/aaai.v33i01.33019167